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v  ABSTRACT  ( 

The  analyses  of  visual  data  by  stereo  and  motion  modules  have  typically 
been  treated  as  separate,  parallel  processes  which  both  feed  a  common  viewer-  y 
centered  2.5-D  sketch  of  the  scene.  When  acting  separately,  stereo  and  motion 
analyses  are  subject  to  certain  inherent  difliculties^stereo  must  resolve  a  com¬ 
binatorial  correspondence  problem  and  is  further  complicated  by  the  presence  of 
occluding  boundaries, ^motion  analysis  involves  the  solution  of  nonlinear  equations 
and  yields  a  3-D  interpretation  specified  up  to  an  undetermined  scale  factor.  -AVl  ^ 

,  new  module  is  described -here- which  unifies  stereo  and  motion  analysis  in  a 
manner  in  which  each  helps  to  overcome  the  other’s  shortcomings.  One  impor¬ 
tant  result  is  a  correlation  between  relative  image  flow  (i.e.,  binocular  difference 
flow )  and  stereo  disparity;  it  points  to  the  importance  of  the  ratio  <5/6,v;rate  of 
change  of  disparity  6  to  disparity  6,  and  its  possible  role  in  establishing  stereo 
correspondence.  Our  formulation  may  reflect  the  human  perception  channel 
probed  by  Regan  and  Beverley  (1979).  — s 
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,  1.  INTRODUCTION 

In  decomposing  the  visual  information  processing  task  into  several  stages,  it 
is  the  intermediate  level  which  is  responsible  for  the  recovery  of  surface  shapes  in 
a  scene.  {Marr  1982).>  It  is  often  described  as  a  set  of  -shape  from”  ^modules 
which,  acting  independently  and  in  parallel,  feed  a  viewer  centered  “1^5-D 
sketch”  of  the  visual  field.  Two  of  the  most  commonly  studied  and  closely 
related  modules  are  shape  from  stereo  ^(Koenderink  and  van  Doom  1976;  Marr 
and  Poggio  1979;  Mayhew  and  Fn^by  1981;  Prazdny  1984;  Pollard  et  al.  1985; 
Eastman  and  Waxman  1985^and  shape  from  monocular  motion  (Koenderink  and 
van  Doom  1975;  Ullman  1979;  Prazdny  1980;  Longuet-Higgins  and  Prazdny  1980; 
Longuet-Higgins  1981;  Tsai  and  Huang  1981a, b;  Waxman  and  Ullman  1983;  Wax- 
man  1984;  Waxman  and  Wohn  1984;  Wohn  and  Waxman  1985a, b;  Subbarao  and 
Waxman  1985;  Buxton  et  al.  1984).  However,  when  acting  independently,  each  of 
these  processes  suffers  from  certain  inherent  difficulties;  stereo  is  faced  with  a 
combinatorial  correspondence  problem  plagued  by  the  presence  of  occluding 
boundaries  (Grimson  1981;  Poggio  and  Poggio  1984),  while  motion  analysis 
involves  the  solution  of  nonlinear  equations  and  leaves  the  3-D  interpretation 
specified  up  to  an  arbitrary  scale  factor  (Waxman  and  Ullman  1983).  There  is 
evidence,  however,  for  a  separate  channel  of  human  visual  processing  in  which 
stereo  and  motion  analyses  may  come  together  much  earlier  than  at  the  2.5-D 
sketch.  We  formulate  here  a  theory  of  time-varying  stereo  in  the  context  of 
“binocular  image  flows,”  where  stereo  and  motion  work  closely  in  order  to  over¬ 
come  each  other’s  shortcomings.  Central  to  our  approach  is  the  notion  of  relative 
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flow  (or  “binocular  difference  flow”),  representing  the  difference  between  image 
velocities  of  a  feature  as  seen  in  the  left  and  right  images  separately.  Neural 
organizations  which  perform  this  “computation”  have  already  been  proposed 
(Regan  and  Beverley  1979). 

The  fusion  of  stereo  and  motion  into  a  single  module  has  been  considered 
recently  by  others  as  well.  Richards  (1983)  demonstrated  recovery  of  structure 
from  orthographic  stereo  and  motion  without  knowledge  of  the  fixation  distance. 
Jenkin  (1984)  considered  a  stereo  matching  process  driven  by  the  3-D  interpreta¬ 
tion  of  feature  point  velocities.  Waxman  and  Sinha  (1984)  proposed  a  “dynamic 
stereo”  technique  based  upon  the  relative  flow  derived  from  two  cameras  in 
known  relative  motion,  valid  in  the  limit  of  negligible  disparity.  The  question  of 
image  motion  aiding  stereo  in  the  matching  process  was  noted  by  Poggio  and 
Poggio  (1984);  and  as  will  be  shown  below,  a  correlation  between  binocular 
difference  flow  and  disparity  may  support  this  possibility. 

We  suggest  a  decomposition  of  our  stereo-motion  module  into  five  steps 
which  begins  where  low-level  vision  ends,  i.e.,  it  follows  the  stage  of  edge  and 
I  point  feature  extraction  (and  tracking  over  time)  in  the  left  and  right  images 

separately. 

j  Step  1:  Monocular  image  flow  recovery  and  flow  segmentation  of  the  separate  left 

and  right  image  sequences  utilizing  the  Velocity  Functional  Method  (Waxman  and 
Wohn  1984)  and  overlap  compatibility  (Waxman  1984;  Wohn  and  Waxman 
|  1985b).  This  procedure  allows  gross  correspondence  to  be  established  between 

analytic  flow  regions  in  the  left  and  right  images.  It  also  reveals  the  depth  and 
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orientation  discontinuities  that  often  plague  stereo  matching  and  surface  recon¬ 
struction  algorithms. 

Step  2:  Establishing  correspondence  between  (previously  unmatched)  left  and 
right  image  features  according  to  a  correlation  between  binocular  difference  flow 
and  stereo  disparity.  This  process  can  be  implemented  in  parallel  over  the  bino¬ 
cular  field  of  view  in  the  context  of  “local  support”  within  neighborhoods 
(Prazdny  1984;  Pollard  et  al.  1985;  Eastman  and  Waxman  1985).  This  correla¬ 
tion  points  to  the  importance  of  the  ratio  8/6,  rate  of  change  of  disparity  <5  to 
disparity  6.  A  “rigidity  assumption”  for  independently  moving  objects  in  the 
scene  also  enters  here. 

Step  S:  Use  of  disparity  functionals  defined  in  overlapping  neighborhoods  to 
recover  smooth  surface  structure  between  the  discontinuities  detected  from  the 
monocular  flow  analyses  (Koenderink  and  van  Doom  1976;  Eastman  and  Wax- 
man  1985). 

Step  4-'  Recovery  of  rigid  body  space  motions  corresponding  to  separate  analytic 
flow  regions  utilizing  the  determined  surface  structure  and  either  monocular 
image  flow  (or  a  cyclopean  image  flow).  Separate  surface  patches  can  then  be 
grouped  into  rigid  objects  sharing  the  same  space  motions.  This  process  entails 
solving  only  linear  equations  as  a  measure  of  its  complexity. 

Step  5:  Use  of  the  separate  image  flows  to  track  features  and  discontinuities  over 
time.  This  allows  refinement  of  disparity  estimates  to  “sub-pixel”  accuracy  by 
temporal  interpolation.  It  also  allows  the  matching  process  to  focus  attention 
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onto  areas  where  new  image  features  will  be  unveiled  and  old  ones  will  disappear, 
i.e.,  at  the  discontinuities  and  periphery  of  thejield  of  view. 

This  last  step  suggests  that,  in  the  analysis  of  a  time-varying  stereo 
sequence,  once  an  initial  correspondence  has  been  determined  between  left  and 
right  images,  it  is  not  necessary  to  establish  correspondence  anew  for  the  entire 
image  pair  at  subsequent  times.  Most  of  the  image  features  merely  flow  to  new 
locations  which  can  be  predicted.  Matching  need  only  be  performed  on  new 
features  which  enter  the  visible  field  from  the  periphery  and  from  behind  occlud¬ 
ing  boundaries. 

In  this  paper  we  formulate  several  of  these  steps  toward  stereo-motion 
fusion.  Section  2  reviews  the  basic  monocular  image  flow  relations  for  rigid 
bodies  in  motion.  The  importance  of  locally  second-order  flows  and  boundaries  of 
analyticity  (i.e.,  weak  and  strong  flow  discontinuities)  is  stressed  as  it  is  impor¬ 
tant  for  the  binocular  flow  analysis  that  follows.  In  Section  3  we  develop  the 
theory  of  binocular  image  flows  in  the  context  of  a  parallel  stereo  configuration, 
imaging  a  scene  of  rigid  objects  in  motion.  A  correlation  is  derived  between  rela¬ 
tive  flow  (binocular  difference  flow)  and  stereo  disparity,  laying  the  basis  for  a 
new  kind  of  matching  procedure.  This  leads  us  to  speculate  on  the  class  of  “head 
motions”  that  are  most  discerning  in  light  of  this  correlation.  Other  relations 
between  monocular  flow  and  binocular  flow  are  obtained  as  well.  In  Section  4  we 
utilize  an  experimental  data  set  for  a  short  stereo  sequence  to  obtain  the  meas¬ 
ured  binocular  image  flows  at  one  time  instant.  These  flows  are  then  filtered 
using  the  Velocity  Functional  Method,  and  a  flow  segmentation  is  derived  in 
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order  to  detect  depth  and  orientation  discontinuities  in  the  scene.  This  data  is 
then  used  to  confirm  the  correlation  between  binocular  flow  and  disparity 
developed  earlier.  Section  5  describes  two  ways  that  this  binocular  difference 
flow-disparity  relation  may  be  implemented  in  order  to  establish  correspondence 
in  the  context  of  “local  support.”  The  ability  to  combine  different  matching  cri¬ 
teria  is  considered  as  well.  We  conclude  in  Section  6  with  a  discussion  of  what 
remains  to  be  done  in  the  construction  of  a  complete  stereo-motion  fusion 
module. 
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2.  MONOCULAR  IMAGE  FLOWS 

Investigations  into  the  recovery  of  3-D  structure  and  motion  from  time- 
varying  monocular  imagery  have  proceeded  along  two  rather  distinct  paths.  One 
approach  has  been  concerned  with  the  motion  of  discrete  points  moving  rigidly 
in  space  (Ullman  1979;  Prazdny  1980;  Longuet-Higgins  1981;  Tsai  and  Huang 
1981a, b;  Adiv  1984).  The  resulting  3-D  interpretation  'is  in  the  form  of  rigid  body 
motion  parameters  and  relative  depth  of  points  in  space.  The  second  approach 
treats  the  image  flow  field  as  a  whole  (Koenderink  and  van  Doom  1976; 
Longuet-Higgins  and  Prazdny  1980;  Waxman  and  Ullman  1983;  Wohn  1984; 
Waxman  and  Wohn  1985)  in  an  attempt  to  recover  the  rigid  body  motion  param¬ 
eters  and  surface  descriptions  (slopes  and  curvatures)  of  entire  surface  patches. 
Recently,  work  has  begun  on  the  3-D  recovery  of  structure  from  non-rigid  body 
motions  (Ullman  1983;  Koenderink,  private  communication).  Our  formulation  of 
binocular  image  flows  will  follow  the  continuous  field  approach  developed  for 
monocular  flows  generated  by  textured  objects  in  rigid  body  motion  (Waxman 
and  Ullman  1983,  Waxman  1984,  Waxman  and  Wohn  1984,  Wohn  1984). 

We  consider  a  scene  as  comprised  of  objects  in  independent  rigid  body 
motion  with  respect  to  the  observer.  The  individual  objects  are  imagined  as 
decomposed  into  surface  patches  visible  to  the  observer,  and  these  surface 
patches  in  space  project  into  neighborhoods  in  the  image.  It  is  actually  the  sur¬ 
face  texture  and  shading  which  is  observed  under  perspective  projection  in  the 
image.  Due  to  the  relative  motion  between  object  and  observer,  the  projected  tex- 
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ture  undergoes  deformations  which  reflect  the  image  flow  field.  The  theory  of 
monocular  image  flows,  developed  by  Waxman  and  collaborators  (cf.  References), 
provides  techniques  for  the  recovery  of  flow  fields  and  deformation  parameters 
from  evolving  contours,  edge  fragments  and  feature  points  in  the  imagery,  and 
for  recovery  of  3-D  surface  structure  and  rigid  body  motion  from  these  deforma¬ 
tions.  As  these  ideas  provide  the  starting  point  for  binocular  flow  analysis,  they 
are  reviewed  in  more  detail  here. 

2.1  Image  Velocity  Relations 

As  a  textured,  rigid  object  moves  through  space,  the  evolving  image 
sequence  registered  by  a  monocular  observer  (e.g.  a  moving  pin-hole  camera)  con¬ 
tains  information  in  the  form  of  an  image  flow  field.  This  image  flow  is  deter¬ 
mined  by  the  relative  rigid  body  motion  between  object  and  observer,  as  well  as 
the  structure  of  the  object’s  surface  visible  to  the  observer.  Derivation  of  this 
flow  field  follows  that  of  Waxman  and  Ullman  (1983). 

We  attribute  the  relative  rigid  body  motion  to  an  observer  represented  by 
the  spatial  coordinate  system  (X,  Y,  Z  )  in  Figure  1.  The  origin  of  this  system  is 
located  at  the  vertex  of  perspective  projection,  and  the  Z  -  axis  is  directed  along 
the  center  of  the  instantaneous  field  of  view.  The  instantaneous  rigid  body 
motion  of  this  coordinate  system  is  specified  in  terms  of  the  translational  velocity 
V  =  ( Vx ,  Vy ,  v2  )  of  its  origin  and  its  rotational  velocity 
fl  =  ,  fly  ,  Qz  ).  The  2-D  image  sequence  is  created  by  the  perspective  pro- 
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jection  of  the  object  onto  a  planar  screen  oriented  normal  to  the  Z  -axis.  The  ori¬ 
gin  of  the  image  coordinate  system  (x ,  y  )  on  the  screen  is  located  in  space  at 
(X,  Y ,  Z  )  —  ( 0,0,  1);  that  is,  the  image  is  reinverted  and  scaled  to  a  focal 
length  of  unity. 

Due  to  the  observer’s  motion,  a  point  P  in  space  (located  by  position  vector 
R  )  moves  with  a  relative  velocity  U  —  -  ( V  +  (lx  fi).  At  each  instant,  point 
P  projects  onto  the  screen  as  point  p  with  coordinates  (x ,  y )  =  (X  /  Z ,  Y  /  Z). 
The  corresponding  image  velocities  of  point  p  are  (vx  ,  vy  1  =  (i ,  y ),  obtained 
by  differentiating  the  image  coordinates  with  respect  to  time  and  utilizing  the 
components  of  U  for  the  time  derivatives  of  the  spatial  coordinates  of  P .  The 
result  is 

vx  =  |  x  - y  }  +  [xy  n*  -  (i  +  x2)  nY  +  y  nz } ,  (la) 

Vy  ==  { y  - Y  }  +  [(1  +  y2)  nx  -  xy  -  x  nz  ]  .  (lb) 

These  equations  define  an  instantaneous  image  flow  field,  assigning  a  unique 
2-D  image  velocity  v  to  each  direction  (x ,  y )  in  the  observer’s  field  of  view.  For 
the  moment,  we  shall  consider  only  a  single  surface  patch  of  some  object  in  the 
field  of  view.  A  small  but  finite  surface  patch  may  be  locally  approximated  by  a 
quadric  surface  in  space  as  described  by  six  parameters:  two  slopes,  three  curva¬ 
tures  and  an  overall  distance  scale.  If  the  surface  patch  is  described  in  this 
viewer-centered  spatial  coordinate  system  by  Z  =  £(X,  Y  ),  then  it  is  straight¬ 
forward  to  find  the  corresponding  local  representation  Z  =  Z  (x ,  y )  as  a 
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second-order  polynomial  in  terms  of  image  coordinates.  Of  these  six  surface 
parameters,  only  five  can  be  recovered  directly  from  the  image  flow  field;  the 
overall  scale  factor  is  lost  as  it  always  appears  in  ratio  with  the  translational  velo¬ 
city  V  (Waxman  and  Ullman  1983).  Moreover,  the  remaining  five  surface  param¬ 
eters  appear  in  product  with  the  translational  space  motion.  The  kinematic 
analysis  developed  by  Waxman  and  Ullman  (1983)  leads  to  a  set  of  twelve  alge¬ 
braic  equations  relating  this  3-D  structure  and  motion  to  derivatives  (through 
second  order)  of  the  image  flow.  Recovery  of  the  S-D  information  requires  solution 
of  nonlinear  equations. 

2.2  Second-Order  Image  Flows 

In  the  recovery  of  surface  structure  and  3-D  motion  from  image  flow,  it  is 
sufficient  to  describe  an  image  flow  as  a  locally  second-order  flow  field.  This  has 
implications  with  regard  to  the  surfaces  which  generate  the  flow  itself.  For  exam¬ 
ple,  a  planar  surface  patch  Z  =  Z0  +  pX  +  qY ,  may  be  described  exactly  as 
Z  —Z0(l-px  -  qy  )_1  in  image  coordinates.  Substitution  into  the  velocity 
equations  above  yields  expressions  in  the  form  of  second-order  polynomials.  For 
planar  surfaces,  such  second-order  flows  are  globally  valid.  On  the  other  hand, 
quadric  surfaces  generate  flows  which  are  not  simple  polynomials  in  the  image 
coordinates.  However,  they  may  be  locally  approximated  as  second-order  flows. 
The  coefficients  of  this  second-order  flow  then  determine  the  slopes  and  (scaled) 
curvatures  of  the  quadric  surface  patch  as  well  as  its  (scaled)  space  motion.  In 
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this  context,  a  complex  surface  is  viewed  as  a  composite  of  overlapping  planar 
and  quadric  patches.  The  image  flow  associated  with  a  smooth  surface  is,  there¬ 
fore,  a  slowly  varying  (in  terms  of  image  coordinates)  second-order  flow  defined 
over  a  region  of  the  image. 

In  order  to  recover  the  second-order  flow  approximation  for  any  neighbor¬ 
hood  in  the  image,  it  is  necessary  to  have  a  sufficiently  dense  texture  present  in 
that  neighborhood.  This  texture  gives  rise  to  extended  contours,  edge  fragments 
and  point  features,  all  of  which  are  convected  along  and  deformed  by  the  local 
image  flow.  These  features  serve  to  sample  components  of  the  flow  field;  in  par¬ 
ticular,  the  contours  and  edges  yield  an  estimate  of  the  flow  in  the  direction  nor¬ 
mal  to  the  contours  themselves.  The  Velocity  Functional  Method  (Waxman  and 
Wohn  1984)  may  then  be  used  to  recover  the  local  flow  from  these  sampled  com¬ 
ponents. 

We  model  the  components  of  the  local  velocity  field  by  second-order  polyno¬ 
mials;  hence,  define  the  partial  derivatives  of  image  velocity  evaluated  at  a  local 
origin  as 


dx  1  dy  ; 

Then  the  components  of  instantaneous  velocity  in  the  neighborhood  are  described 
by  the  two  functionals 


vx  (*.»)  = 


2  2 

£  £ 


('  +7  <2) 


(3a) 
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Note  that  (20)  requires  these  combined  monocular-binocular  flow  quantities  to  be 
linear  functional  forms  in  the  variables  (x ,  y ,  <5  ).  Once  correspondence  is  esta¬ 
blished  between  left  and  right  images  (as  in  Step  2  ),  the  measured  disparities 
may  be  locally  fit  over  small  analytic  neighborhoods  to  a  linear  form  motivated 
by  (14),  thereby  determining  local  surface  structure  (as  described  in  Step  3  ). 
Then  equations  (20)  may  be  used  to  fit  linear  forms  to  the  measured  flow  quanti¬ 
ties  over  analytic  regions  (in  the  least-squares  sense),  and  thus  determine  the 
absolute  rigid  body  motion  parameters  V  and  Cl  for  that  region.  This  requires 
solving  only  linear  equations.  (Recall  that  structure  and  motion  from  monocular 
flow  required  solution  of  nonlinear  equations.)  This  corresponds  to  Step  4  of  the 
stereo- motion  fusion  module  described  in  Section  1. 

The  obvious  symmetries  displayed  by  equations  (20)  suggest  that  they  may 
be  written  in  vector  notation.  Corresponding  to  the  3-D  space  position  vector 
R  =(X,Y,Z)  we  introduce  the  3-D  image  position  vector 
r  =  R  / Z  =  (x  ,y  ,1).  The  3-D  image  velocity  is  defined  as  u  =  f  =  (vx  ,  vy  ,  0)  . 
Then,  recalling  that  Aur  =  8  ,  we  can  rewrite  (20)  as 

*-  |r  =  -  [yH  flXrj  (21) 

It  is  not  coincidental  that  (21)  bears  a  strong  resemblance  to  the  relation  for  3-D 
space  velocity  of  a  point  induced  by  an  observer’s  rigid  body  motion,  i.e., 
C/=i?=-(V'+flXi?).  In  fact,  (21)  is  exactly  this  relationship  for  U  /  Z  ! 
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matching:  the  Vz  head  motion  using  the  vx  component,  and  the  Clz  head 
motion  using  the  vy  component.  In  the  experiments  to  be  described  in  Section  4, 
we  have  examined  the  former  case  while  viewing  a  frontal  plane.  The  possibility 
of  more  complex  head  motions  requires  further  analysis. 

3.4  Monocular  -  Binocular  Flow  Relations 

In  addition  to  the  correlation  that  exists  between  binocular  difference  flow 
and  disparity  (13),  there  are  some  interesting  relations  between  this  binocular 
flow  and  the  monocular  flow  (as  seen  on  the  cyclopean  image,  say).  The  Cyclo¬ 
pean  image  velocity  of  a  feature  is  the  average  of  the  corresponding  feature  velo¬ 
cities  in  the  left  and  right  images  for  this  parallel  stereo  configuration.  Equations 
(1)  can  be  interpreted  as  the  monocular  flow  in  cyclopean  image  coordinates,  with 
space  motion  parameters  and  depth  interpreted  accordingly.  Relations  (13)  and 
(14)  are  valid  in  cyclopean  coordinates  as  well.  Replacing  1  / Z  by  6/b  in  (1)  and 
combining  with  (13),  find 

f  Av*  )  Vx  .  , 

vx  —  x  (— —  )  =  -  fly  +  y  n z - 6  ,  (20a) 

1  Avr  I  Vy 

vy-y  \—f-}  -*Vz  (20b) 

i  Av,  ]  Vz  .  . 

-  |-^-j  =xQY-yQx--^6.  (20c) 

Equations  (20)  have  been  written  with  “measurable”  quantities  on  the  left- 
hand  side  and  unknown  motion  parameters  as  coefficients  on  the  right-hand  side. 
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motions  for  animals  and  machines. 

For  a  preliminary  determination  of  the  “head  motions”  that  are  most 
discriminating  for  matching  purposes,  we  have  examined  Equation  (19)  for  each 
of  the  six  motion  components  separately  while  viewing  a  planar  surface  sloped  in 
the  X-direction  only.  In  particular,  we  seek  the  motions  that  produce  a  velocity 
difference  field  which  is  least  sensitive  to  noise  from  the  measurement  of  the  indi¬ 
vidual  velocity  fields.  More  precisely,  a  motion  that  facilitates  matching  must 
have  two  characteristics.  First,  the  velocity  component  used  for  matching  must 
be  measurable.  For  example,  for  an  fiy  head  motion,  the  vy  component  is 
0(xy  )  while  the  vx  component  is  0(1).  Thus,  the  vy  component  cannot  be 
measured  accurately.  However,  a.  Vz  head  motion  produces  velocity  components 
of  equal  magnitude.  In  order  to  discriminate  correct  from  incorrect  matches,  we 
also  require  that  for  potential  matches,  the  error  (Aw  -  Aw|e  )/w  should  scale 
like  the  percentage  error  in  the  disparity.  Table  I  contains  the  results  of  the 
analysis  including  the  x  and  y  components  of  the  image  flow  velocity  for  the 
cyclopean  system  (w),  the  velocity  differences  for  incorrect  (Aw)  and  correct 
(Aw|e )  matches,  and  the  error  in  the  velocity  difference  for  incorrect  matches 
divided  by  the  velocity  in  the  cyclopean  image  (Aw  -  Aw|c  )/w  .  From  Table  I  we 
see  that  this  error  function  is  0(1)  in  four  cases  listed  in  the  table:  the  vx  com¬ 
ponent  for  Vx  and  fl^-  motions,  and  the  vy  components  for  f2y  an<*  z 
motions.  Two  of  these,  vx  for  fly  motions  and  vy  for  Hy  motions,  will  be  inac¬ 
curate  because  the  particular  velocity  component  is  O  (xy  )  compared  to  its  com¬ 
panion  velocity  component.  Thus  we  are  left  with  two  motions  that  facilitate 
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denote  them  by  7<  and  Zr ,  respectively.  Then  it  is  a  straightforward  exercise, 
following  Section  3.1,  to  derive  an  expression  for  the  ratio  Av  /6  in  the  case  of 
an  incorrect  match.  In  the  cyclopean  coordinate  system  we  find, 


where  the  upper/lower  expressions  in  curly  brackets  refer  to  the  x /y  components 
of  the  ratio. 

There  are  two  sets  of  terms  which  cause  the  ratio  A  v  /6  to  deviate  from  its 
correct  value  when  a  false  match  is  chosen.  The  first  set  is  proportional  to  the 
deviation  from  the  correct  disparity  value,  and  is  generated  by  relative  rotations 
between  the  objects  and  the  eyes/cameras.  The  second  set  is  proportional  to  the 
depth  error,  which  vanishes  for  a  frontal  plane  (it  is  proportional  to  the  X- 
component  of  slope  times  disparity  deviation).  This  second  set  is  generated  by  a 
combination  of  relative  translational  and  rotational  motions.  If  we  consider  the 
case  when  objects  in  the  scene  are  stationary  and  all  motions  are  due  to  the 
cyclopean  coordinate  system,  then  only  one  particular  motion  contributes  to 
every  term  present.  This  is  the  motion  fi  y,  corresponding  to  a  rotation  about  a 
vertical  axis  as  due  to  a  rotation  of  the  head  about  the  neck.  (Perhaps  this  is 
why  our  eyes  are  aligned  perpendicular  to  our  necks!)  It  is  also  interesting  to  note 
that  translation  in  the  direction  of  gaze  Vz,  contributes  to  both  components  of 
the  ratio  (19)  as  well.  Both  ClY  and  ^ z  are  9u>te  natural  exploratory  head 
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-t-  «(*.*)  +  (y(ix-,nr),  (is) 

which  is  identical  with  relation  (13a).  Thus,  this  correlation  between  relative 
image  flows  and  stereo  disparity  is,  in  fact,  a  relationship  between  disparity  and 
its  rate  of  change! 


3.3  Disambiguating  “Head  Motions” 

In  Section  5  below,  we  describe  how  the  correlation  between  relative  flow 
and  disparity  (13)  or  (15),  may  be  used  in  establishing  correspondence  between 
left  and  right  images.  But  the  basic  idea  is  that,  for  a  set  of  hypothetical 
matches  among  features  in  a  neighborhood,  the  measured  ratio  of  relative  flow  to 
disparity  should  be  consistent  with  a  known  functional  form,  i.e.,  a  linear  form  as 
suggested  by  (15).  However,  if  this  correlation  is  to  be  useful  in  establishing 
correspondence,  it  must  be  capable  of  disambiguating  false  matches.  By  consid¬ 
ering  the  possibility  of  false  matches,  we  can  examine  the  class  of  “head  motions” 
(or  camera  motions)  that  generate  significant  deviations  from  the  derived  correla¬ 
tion. 

Consider,  then,  the  relative  image  flow  between  a  feature  in  the  left  image 
and  some  feature  in  the  right  image  shifted  horizontally  by  an  angle  6  and  lying 
along  the  same  epipolar  line.  When  these  features  do  in  fact  match,  the  shift  <5 
equals  the  “correct  disparity”  6C .  For  a  correct  match  the  depth  values  of  the 
left  and  right  features  are  equal.  But  for  an  incorrect  match  they  need  not  be; 
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3.2  Interpreting  the  Correlation 

The  correlation  between  relative  flow  At)  and  disparity  <5,  presented  in 
cyclopean  coordinates  in  (13a, b)  is  simple  to  interpret.  Recall  that  we  are  consid¬ 
ering  only  a  parallel  stereo  imaging  geometry,  hence,  the  epipolar  lines  are  hor¬ 
izontal  (i.e.,  parallel  to  the  x-axes).  Now  the  relative  flow  Av  represents  the 
rate  of  separation  of  a  feature  in  one  image,  from  its  match  in  the  other  image. 
It  is  the  rate  of  change  of  vector  disparity.  As  a  feature  and  its  match  must 
always  lie  along  some  epipolar  line,  its  vertical  disparity  must  remain  zero  in  this 
case.  Thus,  relation  (13b)  expresses  the  fact  that  a  feature  and  its  match  must 
flow  perpendicular  to  epipolars  at  the  same  rate  in  order  to  lie  on  a  common  epi¬ 
polar.  In  general,  the  rate  of  change  of  vertical  disparity  must  be  such  as  to  keep 
a  feature  and  its  match  on  an  epipolar  line. 

For  our  parallel  stereo  configuration,  we  may  then  identify  Avx  with  the 
rate  of  change  of  (horizontal)  disparity  and  denote  it  by  6.  Returning  to  expres¬ 
sion  (14)  we  have 


F  rom  U  =  -(7+fl  X  5)we  have  Z  =  -  Vz  -  Qx  Y  +  ft  y  X ,  hence, 

J  ~-—-{>nx-x(lr)  =  -^j-6-(,Slx-xnr).  (17) 

Combining  (17)  with  equation  (16)  yields  for  6/6, 
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y  *  <*  ’» >  ~H»  n*  ~  1  n  >0  ■  (13a) 

0{x,y)  b 

Av,  ( x  ,y  ;  6) 

6  [x  ,y ) 

with  image  coordinates  and  motion  parameters  corresponding  to  the  cyclopean 
coordinate  system. 

If  we  consider  the  relative  flow  in  a  small  enough  neighborhood  such  that  the 
underlying  surface  patch  may  be  treated  as  locally  planar,  then  we  have  a  simple 
expression  for  the  local  disparity  field, 


6{x,y)=  r  =  (1  -px  -qy),  (14) 

Z  (X  ,y  )  Z q 

where  Z0  is  the  depth  to  the  plane  measured  along  the  center  of  the  cyclopean 
field  of  view,  and  p  and  q  are  the  components  of  local  slope.  Substituting  (14) 
for  the  disparity  on  the  right-hand  side  of  (13)  yields  the  local  relative  flow  to 
disparity  relations, 


&vx  ix  >y )  __  vz 

S(x,y)  ~~  Z0 


p  +  Q  y 


X 


n, 


(15a) 


_  0 

<5  (z  ,y ) 


(15b) 


We  see  that  locally,  the  relative  flow  to  disparity  ratio  is  a  linear  function  of 
image  coordinates  with  coefficients  depending  on  the  surface  structure  and  rela¬ 
tive  motion  between  object  and  observer.  In  Section  5,  we  shall  describe  how  this 
correlation  between  relative  flow  and  disparity  can  form  the  basis  of  a  stereo 
matching  procedure. 
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Equations  (9a, b)  yield  the  image  velocities  of  corresponding  features  in  the  two 
cameras/eyes. 

Now  we  define  the  “relative  flow”  (or  binocular  difference  flow  )  of  features 
between  the  left  and  right  images  as  the  difference  between  the  “shifted  flow 
fields”,  the  “shift”  being  associated  with  the  disparity  field; 


Aw  (x, ,  yt ;  8)  ==  wr  (x,  +6  [x, ,  y, ),  y/ )  -  w,  (x, ,  yt ).  (10) 

Upon  expanding  the  coefficient  matrices  of  (9a, b)  according  to  equations  (1), 
forming  the  relative  flow  (10)  and  simplifying  yields  the  following  expressions  for 
the  components  of  relative  flow; 


Aw,  (x,  ,y, ;  8)  —  Vz  62  +  (  y{  (lx  -  x,  Oy )  8  , 
Aw,  (xi  ,yt ;  8)  =  0  . 


(11a) 

(lib) 


Forming  the  ratio  of  relative  flow  to  disparity  yields 


Avx  (xt  ,yt ;  6) 
6  (*t  iVl ) 
Awy  ( xt  ,yt ;  8) 
6  (xl  ’Vi ) 


y  Vz  8  +  ( Vi  Ox  -  I|fly)  , 


=  0  . 


(12a) 

(12b) 


We  shall  interpret  expressions  (12a, b)  momentarily.  But  first  note  that  this 
ratio  of  relative  flow  to  disparity  is  linear  in  the  variables  x; ,  yi  and  8,  with 
coefficients  proportional  to  the  unknown  parameters  of  relative  motion.  The 
reader  may  verify  for  himself  that,  when  reexpressed  in  the  cyclopean  coordinate 
system  (midway  between  the  two  cameras/eyes),  expressions  (12)  remain 
unchanged!  Thus,  we  may  suppress  the  subscript  “/”  in  (12)  and  write  instead, 
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the  feature  at  (x* ,  yt)  in  the  left  image.  Note  that  over  a  particular  analytic  flow 
region,  the  (horizontal)  disparity  forms  an  analytic  scalar  field  generated  by  the 
smooth  depth  function  Z[  (xt  ).  And  since  the  left  and  right  coordinate  systems 
are  parallel,  the  depth  function  for  the  corresponding  region  in  the  right  image 
may  be  expressed  as 

(xr  ,yT)=Zr  (xl  +$  ixi  >Vl  ]>  Vl  )  ^ 

=  2i  {xi ,  j Ii ). 

Let  us  rewrite  the  monocular  image  velocity  relations  (1)  in  terms  of  transla¬ 
tion  and  rotation  coefficient  matrices, 

=  — 7  T (x,y  )  V+  B{x,y  )  -n  ;  (8) 

Z[x,y) 

these  2X3  matrices  being  functions  of  image  coordinates  alone  with  elements 
easily  obtained  from  relations  (l).  Now  an  expression  like  (8)  may  be  associated 
with  each  image  in  our  stereo  configuration;  the  coordinates,  motion  parameters 
and  depth  function  are,  however,  different.  In  order  to  relate  the  left  and  right 
image  flows  for  a  given  region,  we  shall  express  both  flows  in  terms  of  the  left 
coordinate  system  by  using  expressions  (5,6,7).  Thus,  the  left  image  flow  is  given 
by 

t»/  (*/.»/)  —  -J  < 5  ixi ,  Vl)  3*  (*/ ,  Vl)  -Vi+  fi(xt,  Vi)  •  Of  -  (9a) 

while  the  right  image  flow  is  given  by 

vr  {xi  +6,yt )  =  j  6{xt,y,)T  (x,  +6,y, )  •  J  V)  -  0/  X  tf}+  £(*/  +hVi )  (9b) 
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that  the  left  and  right  cameras/eyes  are  in  motion  with  respect  to  each  other 
when  relative  motion  between  object  and  observer  is  ascribed  to  the  observer.  If 
according  to  the  left  coordinate  system  the  rigid  body  motion  parameters  of  a 
region  are  ( Vt ,  0/),  then  in  the  right  coordinate  system  that  same  region  has 
motion  parameters  (VT ,  Qr),  where 

nr  =  n,  (5a) 

V,  =  V,  -  n,  X  6« ,  (5b) 

and  *  is  a  unit  vector  in  the  common  or -direction. 

Thus,  the  image  flow  fields  of  the  two  eyes/cameras  differ  in  magnitude  as 
well  as  distribution  (due  to  stereo  disparity).  And  as  both  stereo  disparity  and 
monocular  flow  vary  inversely  with  depth,  we  should  not  be  surprised  that  bino¬ 
cular  flow  and  disparity  are  related  in  a  simple  way.  In  fact,  we  shall  see  that 
binocular  flow  is  synonymous  with  “rate-of-change  of  disparity.” 

3.1  Relative  Flow  -  Disparity  Relation 

Given  the  parallel  stereo  configuration,  we  have  the  simple  case  of 
corresponding  features  lying  along  horizontal  epipolar  lines.  Thus,  a  feature 
located  at  position  (x* ,  yt )  in  the  left  image  at  some  instant  of  time  is  located  at 
(xr ,  yT )  in  the  right  image,  where 

yT  =  y, ,  (6a) 

$  (*/ ,  Vi )  m  xr  -  xt  =  b  /Zt  (x,  yt ),  (6b) 

6  (xj ,  yt )  being  the  angular  disparity  between  right  and  left  image  positions  of 
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3.  BINOCULAR  IMAGE  FLOWS 

For  simplicity,  we  restrict  our  analysis  to  the  parallel  stereo  configuration 
illustrated  in  Figure  3.  The  left  and  right  image  planes  lie  in  a  common  plane 
with  the  fixation  point  located  at  infinity  (i.e.,  the  “eyes”  point  straight  ahead). 
The  left  and  right  coordinates,  (xt ,  j/f )  and  ( xr ,  yr )  respectively,  have  their  ori¬ 
gins  at  the  centers  of  their  respective  fields  of  view  separated  by  a  baseline  of 
magnitude  b  along  the  common  direction  of  the  2 -axes.  Each  image  plane  is 
positioned  at  a  focal  length  of  unity  with  respect  to  a  pin-hole  located  at  the  ver¬ 
tex  of  projection  for  each  separate  camera/eye.  This  stereo  configuration  is 
assumed  to  move  rigidly  with  respect  to  other  moving  objects  in  the  scene.  No 
allowance  has  been  made  for  vergence  of  the  eyes  (known  or  otherwise)  in  the 
current  formulation. 

Consider  the  monocular  flow  analysis  of  Step  1  already  performed  separately 
on  the  left  and  right  image  sequences.  The  analytic  flow  regions  bounded  by  flow 
discontinuities  are  assumed  to  be  brought  into  correspondence  rather  easily.  This 
can  be  accomplished  essentially  by  matching  the  flow  discontinuities  between  left 
and  right  images.  The  correspondence  is  gross,  but  allows  the  binocular  flow 
analysis  to  focus  attention  on  individual  regions.  Each  such  region  is  assumed  to 
correspond  to  a  smooth  surface  of  a  rigid  body.  Thus,  we  may  associate  with 
each  region  a  set  of  relative  rigid  body  motion  parameters.  However,  for  the  sake 
of  analysis,  if  we  ascribe  the  rigid  body  motion  to  the  “monocular  observer”,  as 
in  Figure  1  and  equations  (1),  then  the  rigid  body  motion  parameters  for  a  given 
region  are  different  for  the  left  and  right  cameras/eyes.  This  is  due  to  the  fact 
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sitates  the  splitting  and  merging  of  neighborhoods  in  order  to  localize  this  discon¬ 
tinuity.  The  beginnings  of  a  control  structure  governing  the  automatic  segmenta¬ 
tion  of  flow  fields  is  presented  in  Section  4  below. 

2.4  Monocular  Analysis  of  Binocular  Flows 

In  the  case  of  a  binocular  image  sequence,  the  monocular  flow  analysis 
described  above  is  to  be  applied  to  the  left  and  right  image  sequences  separately. 
But  rather  than  going  so  far  as  the  3-D  inference  from  monocular  flow  (Waxman 
and  Ullman  1983)  for  each  sequence,  we  consider  only  the  recovery  and  segmenta¬ 
tion  of  the  separate  image  flows.  This  segmentation  into  analytic  regions  (i.e., 
regions  of  slowly  varying  second-order  flow)  allows  gross  correspondence  to  be 
established  between  these  regions  in  the  left  and  right  images.  It  also  delineates 
the  depth  and  orientation  discontinuities  which  often  plague  stereo  matching  and 
surface  reconstruction  algorithms. 

This  completes  Step  1  of  our  stereomotion  fusion  module.  The  reconstructed 
flow  fields  for  the  left  and  right  images  are  brought  together  in  the  stage  of 
“binocular  flow  analysis”  described  next. 


BINOCULAR  IMAGE  FLOWS 
2.3  Boundaries  of  Analyticity 

From  equations  (1)  it  is  apparent  that  the  flow  field  is  “functionally  ana¬ 
lytic”  (i.e.  twice  differentiable)  wherever  object  surfaces  Z  (x ,  y)  are  twice 
differentiable.  The  flow  is  non-analytic  at  points  where  Z  or  its  first  partials  are 
discontinuous,  and  where  the  relative  space  motion  parameters  change.  Such 
points  occur  along  occluding  boundaries  and  structural  edges  where  surface  orien¬ 
tation  changes  abruptly  (e.g.,  the  edges  of  a  polyhedron).  Thus,  an  image  flow 
field  is  naturally  partitioned  into  regions  of  analyticity  separated  by  singular  con¬ 
tours  (i.e.,  boundaries  of  analyticity).  These  analytic  regions  are,  in  turn,  decom¬ 
posed  into  neighborhoods  in  which  the  image  flow  is  locally  approximated  as  a 
second-order  flow.  It  is  part  of  a  complete  image  flow  analysis  to  delineate  these 
boundaries  of  analyticity  so  that  3-D  interpretations  can  be  assigned  to  the 
regions  within  them.  Figure  2  illustrates  this  partitioning  of  the  image  flow  field. 

In  order  to  detect  the  presence  of  a  boundary  of  analyticity  in  the  flow  field, 
we  try  to  “analytically  continue”  the  flow  from  one  neighborhood  to  the  next. 
This  is  accomplished  by  requiring  the  separate  second-order  flow  approximations 
determined  in  each  neighborhood  to  be  “compatible”  in  an  overlapping  area  com¬ 
mon  to  both  neighborhoods  (Wohn  1984;  Wohn  and  Waxman  1985b).  The  degree 
of  compatibility  between  neighboring  flow  approximations  is  measured  relative  to 
the  agreement  between  the  individual  approximations  and  the  data  from  which 
they  are  obtained.  When  neighboring  flow  approximations  are  deemed  “incompa¬ 
tible,”  it  is  assumed  that  a  boundary  of  analyticity  has  been  crossed.  This  neces- 
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2  2  f  •  m  \  X  *  t#  J 

»,(*.»)-  E  £  V  77  4t  • 

i  =o  /  =o  *  •  J  • 

(«+y<2) 


Now  consider  a  contour  or  edge  fragment  embedded  in  the  neighborhood,  along 
which  the  normal  flow  has  been  measured;  let  this  normal  flow  be  given  by 
vn  [x ,  y ).  Also,  let  the  unit  normal  measured  along  the  contour  be  given  by 
n  (x ,  y  )  =  (nx  ,  ny  ).  Then,  since  vn  svn.it  follows  from  (3a, b)  that 

vn(x,y)=  £  E  7T  7T  \nx  (X’V)  vx(iJ)  +  nj  (X’V)  v9{<'})  }  U) 

$  =o  j  )  \  J 

(«'+/<  2) 

Equation  (4)  relates  the  normal  flow  along  the  contour  to  the  twelve  parameters 
(Taylor  coefficients)  that  characterize  the  full  flow  in  the  neighborhood.  For  each 
point  along  a  contour  at  which  normal  flow  and  the  unit  normal  are  measured, 
expression  (4)  provides  another  constraint  on  these  twelve  coefficients.  In  princi¬ 
ple,  twelve  measurements  along  a  contour  are  the  minimum  required  to  obtain  a 
set  of  twelve  linear  equations  for  the  twelve  unknowns.  In  practice,  it  is  better  to 
use  many  (perhaps  hundreds  of)  measurements  along  a  single  or  multiple  con¬ 
tours  and  edges  in  a  neighborhood,  and  let  equation  (4)  serve  as  the  basis  of  a 
least-squares  approach  for  obtaining  the  set  of  twelve  linear  equations.  Image 
velocity  measurements  at  points  can  easily  be  incorporated  into  (4)  by  choosing 
n  along  the  direction  of  point  motion.  The  Velocity  Functional  Method  has  been 
extended  to  incorporate  data  from  multiple  frames  by  considering  time-varying 
flows  (Wohn  and  Waxman  1985b).  In  this  manner  one  can  essentially  smooth  the 
flow  fields  over  time,  thereby  filtering  out  additional  noise. 
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3.5  Relation  to  Dynamic  Stereo 

An  earlier  attempt  to  recover  depth  to  moving  objects  from  relative  image 
flows  was  termed  Dynamic  Stereo  by  Waxman  and  Sinha  (1984).  The  approach 
was  valid  in  the  limit  of  negligible  disparity,  i.e.,  lim  (6/Z)  — ►  0,  and  required 
the  two  cameras  to  translate  with  respect  to  one  another  in  order  to  develop  a 
difference  flow  field. 

From  equation  (8)  we  see  that  negligible  disparity  implies,  at  lowest  order, 
that  the  coefficient  matrices  T  (x  ,y)  and  #(x  ,y  )  are  the  same  for  both  cameras. 
Then,  if  the  two  cameras  can  translate  with  respect  to  each  other  by  a  known 
amount  AV,  while  their  relative  rotation  is  zero,  a  difference  flow  Aw  results 
which  is  independent  of  the  relative  object  motions  in  the  scene,  i.e., 
Aw  =  Z~x  5*(x,y)  •  A7.  A  known  relative  camera  motion  AV  and  measured 
relative  flow  Aw  allows  determination  of  depth  Z  (x  ,y  ). 

Comparing  this  to  the  formulation  in  Section  3.1  of  the  binocular  difference 
flow-disparity  relation,  we  see  that  equations  (1 1 )  are  providing  us  with  higher 
order  terms  in  powers  of  (b  jZ)  —  <5,  the  disparity.  In  our  simulation  studies 
with  Dynamic  Stereo,  we  found  that  a  finite  baseline  of  one-thousandth  the  depth 
would  perturb  the  relative  flow,  resulting  in  a  depth  error  of  about  2%.  This 
perturbation  can  be  accounted  for  by  considering  the  terms  in  equations  (11). 
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4.  EXPERIMENTS 

A  limited  experimental  program  was  undertaken  to  demonstrate  the  feasibil¬ 
ity  of  implementing  the  first  three  steps  of  the  stereo-motion  module:  Step  1  (flow 
recovery  and  segmentation),  Step  2  (establishing  correspondence  using  the  binocu¬ 
lar  difference  flow)  and,  to  a  limited  extent,  Step  8  (recovering  surface  structure). 
Binocular  image  flow  fields  were  obtained  using  a  camera  mounted  on  a  robot 
arm,  viewing  scenes  consisting  of  white  objects  covered  by  black  dots.  In  general, 
the  experiments  were  successful  insofar  as  they  confirmed  the  potential  of  overlap 
compatibility  for  segmentation  of  laboratory  flow  data,  and  verified  the  binocular 
difference  flow-disparity  relations  for  a  particular  configuration.  Still,  much  work 
remains  before  a  fully  automa  >ic  module  is  realized. 

4.1  Apparatus  and  Procedures 

The  moving  pair  of  stereo  cameras  was  simulated  using  a  single,  black  and 
white,  Sony  (model  DC-37)  CCD-camera  mounted  on  an  American  Robot,  MER¬ 
LIN  robot  arm.  The  images  were  digitized  into  480  X  420  pixel  arrays  using  a 
Grinnell  (GMR-27)  display  processor  and  memory.  The  angular  field  of  view  was 
27.6  X  24.1  degrees  (i.e.,  996.7  pixels  per  radian).  Throughout  this  section,  all 
angular  measurements  are  given  in  units  of  pixels;  time  is  in  units  of  seconds. 
Each  image  flow  field  was  obtained  from  three  frames  taken  with  the  camera  at 
three  positions,  equally  spaced  in  time,  on  its  trajectory.  The  trajectories  and 
viewing  directions  were  chosen  to  simulate  a  pair  of  cameras  in  a  parallel  stereo 
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configuration  (cf.  Fig.  3).  The  baseline  between  cameras  was  3.0  inches. 

The  scenes  consisted  of  white  surfaces  covered  with  a  distribution  of  0.125 
inch  diameter  black  dots.  From  the  typical  viewing  distance  of  40  inches  the 
dots  appeared  in  the  image  with  a  diameter  of  3  pixels.  To  obtain  the  position  of 
the  dots  in  each  image,  individual  images  were  thresholded  and  centroids  of  black 
regions  were  found  according  to: 


Xc 


Vc 


N  X: 

£  ir  ■ 


(22) 


where  (i,- ,  y{)  are  the  image  coordinates  of  the  N  black  pixels  in  each  region. 
The  centroids  of  the  dots  were  tracked  for  three  frames  and  velocities  at  the  cen¬ 
troids  in  the  central  frame  in  time  were  computed  according  to 
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which  is  a  central-difference  accurate  to  0(At2).  The  routine  that  tracks  the 
centroids  from  frame  to  frame  assumes  that  the  distance  from  the  centroid  in  the 
second  frame,  to  the  corresponding  centroid  in  the  first  or  third  frame,  is  smaller 
than  the  distance  to  any  neighboring  feature  points.  In  addition,  to  insure  rea¬ 
sonably  accurate  velocity  measurements,  the  centroid  displacements  from  frame 
to  frame  must  be  10  or  more  pixels.  This  simple  approach  limits  the  density  of 
feature  points  allowed  in  any  one  image.  We  have  used  images  with  about  200 
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feature  points  for  analysis. 

4.2  Image  Flow  Segmentation 

We  have  analyzed  the  scene  shown  in  Figure  4,  which  consists  of  a  planar 
background  with  two  connected  planar  surfaces  in  the  foreground.  The  effective 
camera  motions,  also  shown  in  the  figure,  were  0.25  inches/sec  in  the  viewing 
direction  (toward  the  scene)  and  0.25  inches/second  in  the  X -direction  (parallel 
to  the  scene).  At  the  central  frame  the  cameras  were  about  40  inches  from  the 
foreground  surfaces.  Pictures  of  the  image  flows  obtained  in  this  way  are  shown 
in  Figure  5.  Each  velocity  field  consists  of  about  260  points. 

The  current  segmentation  program  reveals  the  potential  locations  of  flow 
discontinuities,  but  does  not  refine  them  nor  link  them  into  global  boundaries  of 
analyticity.  The  program  first  divides  the  image  into  N2  equal-sized  rectangles; 
in  this  case,  a  5  X  5  rectangular  grid  on  each  480  X  420  pixel  image.  Each  rec¬ 
tangle  contained  an  average  of  about  10  feature  points.  The  velocity  data  in 
each  rectangle  was  then  fit  to  a  pair  of  second-order  polynomials  (cf.  equations  3) 
using  a  linear  least  squares  approach.  The  error  per  point  between  the  data  and 
the  second  order  fit,  defined  as 

err  —  (N  \  vav3  \  )“‘  £ 
i=l 

was  typically  0.02  . 

In  an  attempt  to  see  if  the  polynomial  flow  fields  from  adjacent  rectangles 
were  compatible,  i.e.,  belonged  to  the  same  analytic  flow  region,  the  velocities 
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were  compared  in  overlapping  neighborhoods.  Specifically,  at  vertical  boundaries 
between  left  and  right  rectangles  and  at  horizontal  boundaries  between  upper  and 
lower  rectangles,  an  overlap  compatibility  measure  ( Cv  and  Ch ,  respectively)  was 
computed, 

2  0  ft  11/2 

[—■ //„>-•*  >^»j  -  (25a) 

9  0  1 

c>  = j  ’  (25b) 

where  the  areas  Av  and  Ah  are  shown  in  Figure  6.  After  computing  the  compa¬ 
tibility  for  the  original  5X5  rectangular  grid,  the  calculations  were  repeated 
twice  with  the  grid  shifted  to  the  right  in  each  case  by  one-third  the  rectangle 
width  (approximately  the  distance  between  feature  points).  The  three  horizontal 
grid  positions  were  then  repeated  with  the  grid  shifted  down  by  one-half  the  rec¬ 
tangle  height.  Thus,  the  overlap  error  was  computed  for  the  boundaries  of  6  rec¬ 
tangular  grids  with  25  rectangles  in  each  grid.  A  plot  of  the  overlap  compatibility 
function  is  shown  in  Figures  7  and  8  for  the  vertical  boundaries  of  the  left  and 
right  images,  respectively.  Similar  plots  for  the  horizontal  boundaries  appear  in 
Figures  9  and  10.  Consider  the  compatibility  across  vertical  boundaries  first,  Fig¬ 
ures  7  and  8.  Note  that  the  contours  with  Cv  =  4  (i.e.,  four  times  the  error  in 
fitting  the  polynomials)  do  not  correspond  to  any  structural  feature  of  the  scene. 
Thus,  the  noise  level  appears  to  be  about  4.  In  Figure  7,  both  the  vertical 
occluding  boundary  and  the  vertical  structural  edge  appear  in  the  contours  with 
compatability  errors  as  high  $s  10,  i.e.,  2.5  times  the  noise  level.  For  the  struc- 
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tural  edge  (i.e.,  the  slope  discontinuity)  the  largest  values  appear  slightly  to  the 
right  of  the  feature.  In  Figure  8  similar  contour  shapes  are  seen,  but  the  vertical 
occluding  boundary  is  only  one  rectangle  width  away  from  the  left  side  of  the 
picture  and  is  therefore  not  fully  revealed  by  the  contours.  Note  that  these  con¬ 
tours  also  indicate,  to  some  extent,  the  position  of  the  horizontal  occluding  boun¬ 
dary.  This  horizontal  boundary  is  seen  more  clearly  in  the  compatibility  of 
upper-lower  pairs  of  rectangles,  Figures  9  and  10.  The  compatibility  function  is 
again  typically  8  to  10  at  the  boundary. 

The  flow  field  segmentation  results  indicate  that  the  overlap  compatibility 
method  can  sucessfully  locate  occluding  boundaries  (i.e.,  depth  discontinuities) 
and  to  some  extent  structural  edges  (i.e.,  slope  discontinuities)  in  real  data.  How¬ 
ever,  the  noise  level  and  resolution  of  the  results  need  to  be  improved.  It  is 
believed  that  both  of  these  problems  can  be  remedied  by  increasing  the  density  of 
data  points  in  the  images.  For  small  numbers  of  data  points  in  a  neighborhood, 
the  residual  between  the  measured  data  and  the  polynomial  fit  does  not  reach  a 
stable  mean.  Thus,  both  the  coefficients  of  the  polynomials  and  the  residual 
change  significantly  as  data  points  are  added  or  subtracted  from  the  fit.  In  the 
present  examples,  since  only  10  data  points  were  used  to  fit  each  polynomial,  the 
results  were  not  statistically  stable  and  random  errors  contributed  to  both  the 
residuals  in  adjacent  neighborhoods  and  the  velocity  difference  in  the  overlap 
regions.  Thus,  the  noise  level  in  Cv  and  Ch  was  high.  The  resolution  (or  localiza¬ 
tion)  problem  is  controlled  by  the  size  of  the  rectangles  and  the  magnitude  of 
the  shift  in  the  grid  position.  In  the  present  case  the  rectangles  were  large  (1/5 
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the  image  size)  but  still  only  contained  about  10  data  points.  The  smallest  mean¬ 
ingful  shift  in  the  grid  position  is  the  average  distance  between  data  points,  in 
this  case  about  1/15  the  image  size.  Thus,  the  low  feature  point  density  resulted 
in  low  resolution.  The  low  density  of  feature  points  in  the  present  example  was 
necessitated  by  the  simple  method  used  to  find  feature  point  velocities  from  three 
successive  frames.  This  will  be  modified  in  future  work  to  remedy  the  present 
noise  and  resolution  problems. 

4.3  Binocular  Flow  Field  Experiments 

In  this  section  we  describe  a  preliminary  experimental  exploration  of  the 
binocular  flow  equations  (11).  In  particular,  a  Vz  motion  was  chosen  for  the  cam¬ 
era  pair  and  the  equations  were  verified.  It  was  pointed  out  in  Section  3.3  that 
the  Vz  motion  is  one  of  the  two  single  component  motions  that  will  allow  accu¬ 
rate  discrimination  between  correctly  and  incorrectly  matched  features.  The 
experiment  used  the  camera  set-up  described  earlier  to  simulate  a  pair  of  cameras 
separated  by  a  3  inch  baseline.  The  cameras  viewed  a  planar  surface  perpendicu¬ 
lar  to  the  viewing  direction  (i.e.,  a  frontal  plane).  The  velocity  fields  were 
obtained  with  the  cameras  at  43.5,  45.0  and  46.5  inches  from  the  surface.  The 
velocity  fields  obtained  in  this  manner  are  shown  in  Figure  11.  These  velocity 
fields  show  the  usual  pattern  with  a  focus  of  expansion  near  the  center  of  the 
image.  Due  to  problems  with  the  camera  mount,  it  was  not  possible  to  align  the 
camera  viewing  direction  with  the  direction  of  motionto  better  than  0.5  degrees. 
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With  the  simple  motion  and  scenes  used  here,  it  was  possible  to  correct  for  this 
misalignment.  In  future  experiments  a  pair  of  cameras  aligned  with  a  specially 
designed  stereo  mount  will  be  used  to  alleviate  this  problem. 

The  binocular  flow  equations  (11)  were  verified  by  two  techniques:  one  using 
the  individual  data  points  and  the  other  using  the  polynomial  fits  to  the  velocity 
fields;  the  space  motion  being  known  in  both  cases  here  (which  is  not  generally 
true).  Feature  matching  using  the  individual  data  points  will  be  discussed  first. 
Because  of  the  low  density  of  data  points  and  the  fact  that  matches  lie  along  hor¬ 
izontal  epipolars,  the  pointwise  matching  problem  for  this  example  can  be  done 
rather  easily.  Here  we  present  an  example  of  matching  points  in  the  left  and 
right  images  by  trying  the  various  combinations.  Table  II  contains  the  coordi¬ 
nates  and  velocities  of  four  points  in  the  right  and  left  images  with  y  —  98.0  ± 
1.5  pixels.  The  potential  disparities  (x{  -  xT ),  the  difference  in  the  vx ,  and 
Vz  b  are  given  for  each  of  the  possible  sixteen  combinations  of  the  two  sets  of 
four  points.  There  are  two  constraints  on  the  correct  matches  besides  satisfying 
equations  (11).  First,  the  disparity  must  be  positive.  This  is  a  consequence  of  the 
relative  positions  of  the  cameras.  Second,  the  velocity  difference  Avx  must  be 
positive,  as  can  be  seen  from  equation  (11a)  with  positive  b  and  Vz.  Eight  of 
the  sixteen  combinations  have  these  two  properties.  Of  the  surviving  eight  com¬ 
binations,  only  three  have  nearly  equal  values  of  and  VxS2/b]  they 

correspond  to  correct  matches.  These  are: 
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combination 

S 

Avx 

v,*/b 

lr-21 

71.5 

3.1 

2.6 

2r-3l 

71.0 

2.9 

2.5 

4r-4l 

70.2 

3.2 

2.5 

The  average  disparity  of  71.4  pixels  corresponds  to  a  distance  of  41.9  inches,  close 
to  the  correct  value  of  45  inches.  Below  we  shall  see  that  this  error  is  due  to  cam¬ 
era  misalignment. 

We  now  turn  our  attention  to  matching  using  the  velocity  fields  derived 
from  the  polynomial  fits.  Using  these  polynomials  and  the  known  space  motion, 
it  is  possible  to  obtain  an  expression  for  6  as  a  continuous  function  of  image  coor¬ 
dinates.  For  this  example,  each  image  has  been  divided  into  16  rectangular 
regions  with  dimensions  of  86.4  X  94.4  pixels  each.  Second-order  polynomials 
have  been  fit  to  the  velocity  data  in  each  region.  The  polynomials  have  the  form 

(vz  )t  =  B0i  +  Bu  xt  +  B2ly  +  B3l  xt  2  +  B  4l  y2  +  B5l  x,  y  , 

Mr  =  B0r  +  Buxr  +  B2ry  +  BZrxr2  +  B4ry2  +  B5rxry  . 

•• 

Defining  the  potential  disparity  as  <5  =  (xr  -  xt ),  the  velocity  difference  can  be 
expressed  as 

Av*  =  {B0r  ~Bqi  )  +  (Blr  -B  u  )x{  +  (f?2r -£2/ )y  +  [Bzr~Bzi)xi2 
+  ( B4f-B4i)y 2  +  (B5r-B5l)xiy  +  (Blr  +BSf  y  +2B3r  x,  )6  +B3rP  . 

When  6  is  chosen  correctly,  this  polynomial  expression  will  equal  V2^/b  .  Thus, 
we  obtain  a  second-order  polynomial  in  6  whose  solution  gives  the  correct  dispar¬ 
ity  value: 
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(•#3  r~  Vz  lb)  &  +  1  r  +  £5  r  1 /  +  2P3r  X/  )  <5  +  P  (Xj ,  y  )  =  0  ,  (28) 

where  P  corresponds  to  the  terms  in  (27)  that  are  independent  of  8. 

This  polynomial  in  8  has  been  solved  in  each  of  the  rectangles  at  a  point  25 
pixels  to  the  right  of  each  rectangle’s  center  in  the  left  image  (thus,  its  match  in 
the  right  image  will  be  to  the  left  of  center  in  the  corresponding  rectangle).  The 
disparity  should  be  the  same  everywhere  in  the  image.  The  result,  averaged  over 
the  sixteen  rectangles  is  78.8  pixels  with  a  standard  deviation  of  4.0.  This 
corresponds  to  Z  =  38.0  ±  2.0  inches,  compared  to  the  correct  value  of  45.0 
inches. 

The  source  of  the  error  is  the  misalignment  of  the  cameras,  as  can  be  seen 
from  the  velocity  fields  below.  Because  the  cameras  are  moving  toward  a  frontal 
plane,  the  focus  of  expansion  of  the  velocity  field  should  be  at  the  center  of  the 
image  and  the  velocity  components  should  be  anti-symmetric.  The  velocity  com¬ 
ponent  vt  at  the  center  of  each  rectangle,  averaged  over  the  four  rectangles  in 
each  of  the  four  columns  is  given  below  for  the  left  and  right  images. 

Average  Horizontal  Velocity  ( vx  ) 


x  =  -177 

x=-59 

x=59 

x= 

(vx  )/  1  avg 

-5.7 

-1.8 

2.2 

6.0 

( vi  )r  1  avg 

-6.2 

-2.3 

1.6 

5.6 

Note  that,  adding  0.3  to  the  values  of  the  right  image  and  subtracting  0.15  from 
the  values  of  the  left  image  would  leave  both  velocity  distributions  nearly  anti- 
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symmetric.  The  deviations  from  anti-symmetry  correspond  to  camera  misalign¬ 
ments  of  about  0.5  degrees.  These  corrections  can  also  be  applied  to  the  velocity 
difference  calculations  by  adding  0.45  to  the  calculated  value.  The  corrected 
disparity,  averaged  over  the  sixteen  rectangles  as  above,  is  66.3  ±  4.5  pixels  or 
45.1  ±  3.0  inches,  which  is  the  correct  value.  Also  subtracting  0.45  from  the  velo¬ 
city  differences  for  the  individual  point  combinations  in  Table  II  brings  the  Avz 
and  Vzl?/b  values  into  very  close  agreement  at  the  correct  matches. 
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5.  MATCHING  VIA  LOCAL  SUPPORT 

In  the  case  of  a  static  stereo  pair  of  images,  many  algorithms  have  been  sug¬ 
gested  for  establishing  correspondence  between  features  (i.e.,  edges  and  points)  in 
the  left  and  right  images.  Knowledge  of  the  stereo  geometry  constrains  matches 
to  lie  along  known  epipolar  lines  (horizontal  in  the  case  of  our  parallel 
configuration).  Recently,  several  algorithms  have  emerged  which  are  based  on 
the  notion  of  local  support  of  disparity  (Prazdny  1984;  Pollard,  Mayhew  and 
Frisby  1985;  Eastman  and  Waxman  1985).  Prazdny’s  algorithm  attempts  to 
embody  the  concept  of  “coherence”  in  the  local  disparity  distribution,  by  assign¬ 
ing  a  weight  to  each  potential  match  of  a  feature  based  on  a  measure  of  similar¬ 
ity  between  that  disparity  and  potential  disparities  of  other  nearby  features.  Pol¬ 
lard  et  al.  have  developed  a  matching  algorithm  which  is  driven  by  “local  con¬ 
sistency  with  a  prescribed  disparity  gradient  limit”  of  unity  (selected  on  the  basis 
of  psychophysical  experiments).  Again,  potential  matches  of  features  are  found, 
and  the  potential  disparities  of  nearby  points  are  tested  for  compliance  with  the 
disparity  gradient  limit.  The  approach  of  Eastman  and  Waxman  is  based  on  the 
notion  of  “analytic  disparity  fields”  in  overlapping  neighborhoods.  Potential 
matches  between  contours  (i.e.,  extended  edges)  in  the  left  and  right  images  are 
established.  Then,  motivated  by  (14)  the  implied  disparities  are  fit  to  a  linear 
functional  form  (in  the  least-squares  sense)  for  potentially  matched  contours  in  a 
neighborhood,  thereby  yielding  a  locally  planar  interpretation  along  with  the 
average  residual  (measuring  goodness  of  fit).  A  match  is  then  selected  on  the 
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basis  of  minimizing  this  residua!  (and  so  maximizing  local  support)  subject  to  the 
disparity  gradient  (derived  from  the  functional)  being  less  than  a  limit  of  unity. 
Our  use  of  locally  analytic  disparity  fields  is,  in  fact,  a  mathematical  realization 
of  “coherence.”  All  of  these  “local  support”  algorithms  may  be  implemented  in  a 
local  and  parallel  manner. 

For  our  case  of  time-varying  stereo,  we  suggest  the  use  of  the  binocular 
difference  flow-disparity  relation  (15)  to  establish  correspondence  in  our  neighbor¬ 
hoods.  Of  course,  the  static  matching  algorithms  based  on  disparity  alone  may 
be  used  as  well,  but  here  we  explore  the  additional  exploitation  of  flow  to  drive 
the  matching  procedure.  We  can  implement  the  matching  procedure  in  either  of 
two  ways,  both  of  which  embody  the  concept  of  “local  support”  for  matching  a 
neighborhood. 

Upon  considering  (15b)  first,  we  see  that  a  feature  and  its  corresponding 
match  along  the  epipolar  should  have  the  same  image  velocity  perpendicular  to 
the  epipolar.  This  may  seem  to  establish  correspondence  directly,  however,  it  is 
not  very  selective  since  the  velocities  themselves  do  not  vary  greatly.  The  prob¬ 
lem  is  that  (15b)  does  not  describe  a  trend  of  variation  over  a  neighborhood, 
though  it  does  constrain  the  matching.  On  the  other  hand,  (15a)  is  well  suited 
for  matching  with  local  support.  If  in  a  small  neighborhood  we  approximate  the 
underlying  surface  as  planar,  then  (15a)  suggests  that  6/5  is  locally  a  linear  func¬ 
tion  of  the  cyclopean  image  coordinates.  Thus,  we  seek  local  support  for  the  ana¬ 
lytic  form 
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4 =*  C0  +  Ctx  +  Cyy  ,  (29) 

d  (x,  y  ) 

where  the  left  hand  side  consists  of  measurements  6  =  Avx  and  <5  for  potential 
matches,  and  the  coefficients  C0 ,  Cx  ,  Cy  are  determined  in  the  least  squares 
sense.  This  approach  is  appropriate  for  matching  whole  contours,  where  the 
many  disparity  measurements  implied  can  be  used  in  the  least  squares  procedure. 
The  matches  which  minimize  the  average  residual  are  considered  as  having  max¬ 
imum  local  support. 

Alternatively,  one  can  seek  matches  which  maximize  local  support  in  light  of 
Prazdny’s  (1984)  approach.  We  first  establish  all  potential  matches  along  epipo- 
lars  and  note  the  value  of  6/6  corresponding  to  each  potential  match  for  each 
feature.  We  then  consider,  for  each  feature  i ,  each  of  its  neighbors  j  over  some 
small  area  around  it.  Then  choose  those  matches  with  values  of  [6/8),  and  {8/6)J 
which  are  closest.  As  (29)  implies  that  6/6  varies  linearly  with  angular  separa¬ 
tion,  this  suggests  forming  the  quantity 

s  i  m,  -  m,  \  /  .  (30) 

where  s,;- 2  =  (x,  -  x;  )2  +  (y,  -  y;  )2  .  Pairs  of  potential  matches  which  support 
(29)  will  generate  a  value  for  u ~  O  (Cx  ,  Cy  )  ,  whereas  pairs  of  matches 
which  don’t  support  (29)  lead  to  ~  O  ( C0/s ,y )  >>CX  or  Cy  .  As  has 
units  of  inverse  time,  we  must  adopt  a  local  time  constant  r,  and  consider  the 
dimensionless  quantity  r,  u;,y  as  the  primary  variable  measuring  similarity.  A  rea¬ 
sonable  choice  for  r  is  (6/<5),  -I  ~  O{C0~l).  Hence,  we  wish  to  create  a  support 
function  which  is  0(1)  when  (r,-u/, y)2  is  small,  and  then  drops  to  zero  as  (r,  w,;  )2 
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Still,  much  work  remains  to  be  done  before  a  complete  module  of  this  type 
can  be  constructed.  The  control  structure  for  the  flow  segmentation  procedure 
requires  further  development.  This  segmentation  procedure  should  be  iterative, 
with  subsequent  refinements  occurring  near  detected  flow  discontinuities.  The 
discontinuities  in  left  and  right  images  must  also  be  matched  in  order  to  establish 
gross  correspondence  among  analytic  regions.  The  binocular  difference  flow- 
disparity  relation,  derived  in  Section  3,  requires  further  testing  in  order  to  insure 
its  validity  under  more  general  classes  of  motion  than  tried  here.  It  should  also 
be  generalized  to  incorporate  vergence  effects.  The  matching  techniques 
described  in  Section  5  need  to  be  implemented  and  tested  in  a  variety  of  cases. 
The  ability  to  combine  evidence  in  establishing  correspondence  is  an  appealing 
aspect  of  the  approach  and  needs  to  be  implemented  as  well. 

The  possible  role  of  a  combined  stereo-motion  module,  such  as  this  one,  in 
the  human  visual  processing  task  raises  some  interesting  questions.  How  does  the 
brain  utilize  disparity  estimates  and  binocular  flow-disparity  cues  in  establishing 
correspondence?  Does  one  take  priority  over  the  other,  or  are  they  combined? 
What  happens  when  structure  from  binocular  flow  conflicts  with  structure  from 
static  stereo  (Mayhew  and  Frisby,  private  communication)?  Does  one  percept 
dominate  or  do  we  see  illusions?  Are  there  certain  kinds  of  “head  motions”  pre¬ 
ferred  for  disambiguating  false  matches?  Is  there  a  “gradient  limit”  effect  associ¬ 
ated  with  the  coefficients  of  the  linear  terms  in  equation  (15a)?  Is  it  possible  to 
fuse  a  dynamic  stereogram  which  is  beyond  the  static  disparity  gradient  limit  of 
unity?  Perhaps  psychophysical  experiments  can  resolve  some  of  these  questions. 
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6.  CONCLUSIONS 

% 

In  this  paper  we  have  outlined  a  set  of  five  steps  toward  the  development  of 
a  stereo-motion  fusion  module.  The  successful  development  of  a  complete 
module  of  this  type  has  enormous  potential  for  robotics  in  a  dynamic  environ¬ 
ment.  It  may  also  shed  some  light  on  the  nature  of  the  processing  going  on  in 
the  human  visual  pathway.  In  this  respect,  the  work  of  Regan  and  Beverley 
(1979)  is  most  relevant,  for  their  own  psychophysical  and  neurophysiological  stu¬ 
dies  have  led  them  to  suggest  the  existence  of  neural  organizations  which  may 
“compute”  the  binocular  difference  flow  (or  relative  flow  between  the  eyes)  which 
is  so  basic  to  our  own  theory. 

The  basic  advantages  this  module  offers  over  static  stereo  are:  monocular 
detection  of  the  depth  and  orientation  discontinuities  (before  matching  is 
attempted),  use  of  a  correlation  between  binocular  difference  flow  and  disparity 
to  drive  the  matching  process  (either  independent  of,  or  in  conjunction  with 
matching  based  on  disparity  alone),  the  ability  to  refine  disparity  estimates  to 
sub-pixel  accuracy  by  considering  the  smooth  orbits  of  features  through  the  left 
and  right  image  space-times,  and  the  potential  to  focus  attention  of  the  matching 
process  to  the  areas  where  new  features  enter  the  field  of  view.  The  advantages 
of  this  module  over  structure  from  monocular  motion  are:  the  ability  to  recover 
absolute  structure  and  rigid  body  motions  (without  scale  '  *  'or  ambiguities),  and 
that  only  linear  equations  need  be  solved  to  recover  rigid  body  motion  parame¬ 
ters. 
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above.  One  can  require  either  independent  confirmation  of  a  match  (both 
processes  running  in  parallel  lead  to  the  same  conclusion),  or  combined  evidence 
of  a  match  based  on  redundant  support  (using  the  product  of  independent  sup¬ 
port  functions,  hence  the  logical  “AND”)  or  complementary  support  (using  the 
sum  of  independent  support  functions,  hence  the  logical  “OR”).  This  method  of 
combining  evidence  for  matching  awaits  implementation  as  well. 
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grows.  The  function  should  seek  support  over  only  a  local  neighborhood  around 

feature  * .  Denoting  this  function  by  W  (r,-  u>,y ),  we  form  W  (r,-  )  over  the 

J 

neighborhood  and  select  the  match  for  feature  i  with  value  (6/6) ,•  that  generated 
the  largest  percentage  of  the  sum;  it  is  most  similar  to  its  neighbors  in  a  manner 
consistent  with  the  linear  form  (29). 

This  is  essentially  Prazdny’s  algorithm,  adapted  to  the  variable  6/6  . 
Clearly,  it  is  applicable  to  any  variable  which  can  be  locally  approximated  as  a 
linear  form,  including  disparity  itself.  Such  a  matching  strategy  leads  naturally 
to  a  preference  for  small  gradients  in  the  matching  variable.  Thus,  a  kind  of 
“gradient  limit”  emerges.  This  is  well  known  for  disparity  alone  in  static  stereo¬ 
grams  (Burt  and  Julesz  1980).  But  does  such  a  gradient  limit  exist  for  dynamic 
stereograms?  Could  fusion  be  achieved  with  a  dynamic  stereogram  for  which  the 
disparity  gradient  limit  is  exceeded? 

We  have  yet  to  implement  our  matching  strategy  and  so  cannot  comment  on 
its  possible  strengths  or  weaknesses.  But  in  keeping  with  Step  5  of  Section  1,  we 
expect  that  once  correspondence  is  initially  established,  new  features  emerging 
from  behind  occluding  boundaries  and  the  periphery  are  easily  matched.  They 
are  entrained  into  the  local  disparity  field  by  a  spreading  of  local  support  from 
previously  matched  features  in  the  neighborhood. 

Finally,  we  can  consider  the  possibility  of  combining  multiple  matching  cri¬ 
teria.  For  example,  disparity  6  and  the  ratio  6/6  may  both  be  used  to  establish 
correspondence,  and  can  both  be  implemented  in  the  same  fashion  as  outlined 
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